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Abstract We tackle the inverse problem of reconstructing an unknown finite mea- 
sure jj. from a noisy observation of a generalized moment of fi defined as the integral 
of a continuous and bounded operator <t> with respect to jj,. When only a quadratic 
approximation <P m of the operator is known, we introduce the L 2 approximate max- 
imum entropy solution as a minimizer of a convex functional subject to a sequence 
of convex constraints. Under several assumptions on the convex functional, the con- 
vergence of the approximate solution is established and rates of convergence are 
provided. 



1 Introduction 

A number of inverse problems may be stated in the form of reconstructing an un- 
known measure jj. from observations of generalized moments of /i, i.e., moments y 
of the form 



where <P : 3C — > M is a given map. Such problems are encountered in various 
fields of sciences, like medical imaging, time-series analysis, speech processing, 
image restoration from a blurred version of the image, spectroscopy, geophysical 
sciences, crytallography, and tomography; see for example Decarreau et al (1992), 
Gzyl (2002), Hermann and Noll (2000), and Skilling (1988). Recovering the un- 
known measure \i is generally an ill-posed problem, which turns out to be difficult 
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to solve in the presence of noise, i.e., one observes y' 



given by 




<P(x)dll(x)+E. 



(1) 



For inverse problems with known operator <£>, regularization techniques allow the 
solution to be stabilized by giving favor to those solutions which minimize a regu- 
larizing functional J, i.e., one minimizes J(jJ.) over jj. subject to the constraint that 
Jj- <P(x)djj,(x) = y when y is observed, or <P(x)djj,(x) <G Ky in the presence of 
noise, for some convex set Ky containing y° bs . Several types of regularizing func- 
tional have been introduced in the literature. In this general setting, the inversion 
procedure is deterministic, i.e., the noise distribution is not used in the definition of 
the regularized solution. Bayesian approaches to inverse problem allow one to han- 
dle the noise distribution, provided it is known, yet in general, a distribution like the 
normal distribution is postulated (see Evans and Stark, 2002 for a survey). However 
in many real-world inverse problems, the noise distribution is unknown, and only 
the output y is easily observable, contrary to the input to the operator. Consequently 
very few paired data is available to reliably estimate the noise distribution, thereby 
causing robustness deficiencies on the retrieved parameters. Nonetheless, even if the 
noise distribution is unavailabe to the practitioner, she often knows the noise level, 
i.e., the maximal magnitude of the disturbance term, say p > 0, and this information 
may be reflected by taking a constraint set Ky of diameter 2p. 

As an alternative to standard regularizations such as Tikhonov or Galerkin, see 
for instance Engl, Hanke and Neubauer (1996), we focus on a regularization func- 
tional with grounding in information theory, generally expressed as a negative en- 
tropy, leading to maximum entropy solutions to the inverse problem. In a determinis- 
tic framework, maximum entropy solutions have been studied in Borwein and Lewis 
(1993, 1996), while some others study exist in a Bayesian setting (Gamboa, 1999; 
Gamboa and Gassiat, 1997), in seismic tomography (Fermin, Loubes and Ludena, 
2006), in image analysis (Gzyl and Zeev, 2003; Skilling and Gull, 2001). Regular- 
ization with maximum entropy also provides one with a very simple and natural 
manner to incorporate constraints on the support and the range of the solution (see 
e.g. the discussion in Gamboa and Gassiat, 1997). 

In many actual situations, however, the map <P is unknown and only an approxi- 
mation to it is available, say <P m , which converges in quadratic norm to <P as m goes 
to infinity. In this paper, following lines devised in Gamboa (1999) and Gamboa 
and Gassiat (1999) and Loubes and Pelletier (2008), we introduce an approximate 
maximum entropy on the mean (AMEM) estimate fL„ h „ of the measure jj.x to be 
reconstructed. This estimate is expressed in the form of a discrete measure concen- 
trated on n points of 3£ . In our main result, we prove that fi m „ converges to the 
solution of the initial inverse problem as m — > °° and n — ► °° and provide a rate of 
convergence for this estimate. 
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The paper is organized as follows. Section 2 introduces some notation and the 
definition of the AMEM estimate. In Section 3, we state our main result (Theo- 
rem[2]i. Section 4 is devoted to the proofs of our results. 



2 Notation and definitions 
2.1 Problem position 

Let <t> be a continuous and bounded map defined on a subset 3I> of M. d and taking 
values in R*. The set of finite measures on (5£ ,£8(5£)) will be denoted by ^{3£), 
where denotes the Borel a-field of SC '. Let fix G ^(JT) be an unknown 

finite measure on and consider the following equation: 

y = j ®{x)dnx(x). (2) 

Suppose that we observe a perturbed version y° bs of the response y: 

y° bs = [ <P(x)dfl x ( X ) + e, 

Jar 

where £ is an error term supposed bounded in norm from above by some positive 
constant rj, representing the maximal noise level. Based on the data y obs , we aim at 
reconstructing the measure fix with a maximum entropy procedure. As explained 
in the introduction, the true map <P is unknown and we assume knowledge of an 
approximating sequence <P m to the map <t>, such that 

11^ - = ^/E(||«P m (X)-«P(X)||2) - 0, 

at a rate <p m . 

Let us first introduce some notation. For all probability measure v on R", we 
shall denote by «Sf v , A v , and A* the Laplace, log-Laplace, and Cramer transforms 
of v, respectively defined for all sgR" by: 

J2?v00 = / exp(s,x)dv(x), 
Jm." 

Ay (J) -lOgjSfvW, 

A*(s) = sup {(s,u) -A v (u)}. 

uew 

Define the set 

K Y = {yeR k :\\y-y" hs \\^r 1 }, 
i.e., Ky is the closed ball centered at the observation y obs and of radius 77. 
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Let X be a set, and let ^(JT) be the set of probability measures on S£ . For 
V,/i G ^ the relative entropy of v with respect to jj, is defined by 



ff (v| M ) = (^ l0g fe)" VifV<< / 



otherwise. 

Given a set ^ G ^(X) and a probability measure /I G ^(JT), an element jx* of 
is called an I-projection of jj. on ^ if 

//(MiM) = mf H(v|m). 

Now we let 3£ be a locally convex topological vector space of finite dimen- 
sion. The dual of X will be denoted by 3£' . The following two Theorems, due to 
Csiszar (1984), characterize the entropic projection of a given probability measure 
on a convex set. For their proofs, see Theorem 3 and Lemma 3.3 in Csiszar (1984), 
respectively. 

Theorem 1. Let be a probability measure on 36 ' . Let ^ be a convex subset of 36 
whose interior has a non-empty intersection with the convex hull of the support of 
fi. Let 

n( X) = {P G 9{X) : J s xdp ( x ) G 

Then the I-projection pi* of fl on TK^o ) is given by the relation 

, . expA*(x) , 

f x expk*(u)dn(u) 



where A* G 36"' is given by 



A * = arg max 

AgJT' 



inf A (jc) — log / expk(x)du(x) 



Now let Vz be a probability measure on K + . Let Px be a probability measure on 
36 having full support, and define the convex functional I Vz (\i\Px) by: 

{ +°° otherwise. 

Within this framework, we consider as a solution of the inverse problem (f2]i a mini- 
mizer of the functional I Vz (jx\P x ) subject to the constraint 

H G S(K Y ) = {fl6 Jt(S£) : J <P(x)d!l(x) G K Y ] . 
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We introduce the approximate maximum entropy on the mean (AMEM) estimate as 
a sequence fi. m>n of discrete measures on S£ . In all of the following, the integer m 
indexes the approximating sequence <P m to <J>, while the integer n indexes a random 
discretization of the space S£ . For the construction of the AMEM estimate, we pro- 
ceed as follows. 

Let (X\ ,X„) be an i.i.d sample drawn from Px- Thus the empirical measure 
- Yl!=i &j converges weakly to Px- 

Let L„ be the discrete measure with random weights defined by 

1 " 

L„ = -YZi8 Xi , 

»N 

where (Z,),- is a sequence of i.i.d. random variables on R. 

For 5? a set we denote by co.5^ its convex hull. Let Q m .„ be the probability event 
defined by 

&m,n = [Ky H coSuppF t vf" ^ 0] (3) 

where F : W — > M* is the linear operator associated with the matrix A„,,„ = 
^( < S , m(-^/))(j,y)e[i,ife]x[i,n] an d where F*V® n denotes the image measure of v^" by 
F. For ease of notation, the dependence of F on m and n will not be explicitely 
written throughout. 

Denote by &(M.") the set of probability measures on R". For any map f : — s- 
R k define the set 

H„(f,tf F ) = jve.^(R"):E v 

Let v* „ be the I-projection of vf " on n„(<P m ,K Y ). 

Then, on the event i2,„ : „, we define the AMEM estimate /i,„ : „ by 

Am,n = E v * „ [Ln] , (4) 

and we extend the definition of ji m n to the whole probability space by setting it to 
the null measure on the complement £2f n n of £2 m .„. In other words, letting (zi,--,Z n ) 
be the expectation of the measure v* n „, the AMEM estimate may be rewritten more 
conveniently as 

1 " 

Am,n = - Y,Zi8xi (5) 
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with Zi = E v * (Z,-) on i2„ v ,, and as {L m>n = on Qf n n . It is shown in Loubes and 
Pelletier (2008) that P(i2 m ,„) — > 1 as m — > °° and « — > oo, Hence for m and « large 
enough, the AMEM estimate \i m „ may be expressed as in (f5]l with high probability, 
and asymptotically with probability 1. 

Remark 1. The construction of the AMEM estimate relies on a discretization of the 
space 3£ according to the probability Px- Therefore by varying the support of Px, the 
practitioner may easily incorporate some a-priori knowledge concerning the support 
of the solution. Similarly, the AMEM estimate also depends on the measure Vz, 
which determines the domain of Ay , and so the range of the solution. 



3 Convergence of the AMEM estimate 
3.1 Main Result 

Assumption 1 The minimization problem admits at least one solution, i.e., there 
exists a continuous function go : S£ — > coSupp Vz such that 



Assumption 2 

(i) domA Vz :={s: \A Vz (s)\ < °o} = M; 

(ii) A(, z and A" z are bounded. 

Assumption 3 The approximating sequence <P m converges to in L 2 (&" ,Px). 
Its rate of convergence is given by 



Assumption 4 A Vz is a convex function 

Assumption 5 For all m, the components of <P,„ are linearly independent 

Assumption 6 A' Vz and A" are continuous functions. 
We are now in a position to state our main result. 

Theorem 2 (Convergence of the AMEM estimate). Suppose that Assumption 1, 
Assumption 2, and Assumption 3 hold. Let /i* be the minimizer of the functional 



subject to the constraint jX G S(Ky) — {fl G {S£) : J r 0(x)djl(x) G K Y }. 
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• Then the AMEM estimate fl mn is defined by 

1 - 

frm,n = ~ E K z ( (V« , <&m {%)) ) S Xi 
n i=\ 

where v„ hn minimizes on M. k 

H n (® m ,v) = i£y\ Vz ((v,0 m (Xi))) - inf (v,y) 
n r~[ yeKy 

• Moreover, under Assumption 4, Assumption 2, and Assumption 3, it converges 
weakly to n* as m — ► °° and n — > oo. 7?s rafe of convergence is given by 

\\£L,n,n~ll*\\vT = P ((p m l ) + Op 

Remark 2. Assumption 2-(i) ensures that the function H(<P,v) in Theorem|2] attains 
its minimum at a unique point v* belonging to the interior of its domain. If this 
assumption is not met, Borwein and Lewis (1993) and Gamboa and Gassiat (1999) 
have shown that the minimizers of I Vz (fi\Px) over S(Ky) may have a singular part 
with respect to Px- 

Proof. The rate of convergence of the AMEM estimate depends both on the dis- 
cretization n and the convergence of the approximated operator m. Hence we con- 
sider 

v,„.oo = argminH(<£„„v) = argmhJ / A Vz ((<P m (x),v})dPx - inf (v,y) 
veR* veR* y£Ky 

1 " 

Am,n = -£A(, z ((v mj „,3> m (.)})&i! 
n i=l 

Am,°° = K z ((®m{-),$m,°°))Px- 
We have the following upper bound 

Wfi mt „-H*\\VT «S \\frm,n - (hnAvT + \\flm,o°- LI*\\VT, 

where each term must be tackled separately. 
First, let us consider ||Am,n — Am VT- 

1 " 

! | Am,«- P-m.oo 1 1 VT = 1 1 - E K z ( , V V« ) ) - ( (<£ m , V m ,oo) )flf 1 1 VT 
H i=l 

<l I - E ( A v z ( (#» , Vn) ) - K z ( - <W-») h 1 1 VT 
" i'=l 

+ 1 1 - E A v z ( (^» , v m ,-) ) - A; z ((<£,„ , )ft 1 1 vr 
n . =1 
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1 " 

To bound the first term 1 1 - £ ( A ' Vz ( ( <P m , v„,,„ ) ) - A' Vz ( ( ® m , v m ^ } )) 5x ( \ \ vr , let g be 
a bounded measurable function and write 

n . =1 

1 " 

< \\g\\o°\\A"\\„-Y(<P m (Xi),V m .„-V m ^) 




where we have used Cauchy-Schwarz inequality. Since {<t> m ) m converges in h 2 (Px), 
it is bounded in L 2 (Px ) -norm, yelding that i j j 1 <£>,„ (X; ) 1 1 converges almost surely 
to E||4>,„(X)|| < oo. Hence, there exists K\ > such that 

1 " 

||- 2^ (Ay z ({<P m ,V m ^))-Ay z ({<P m ,V m> J))) 5x,\\vT <^l||Vm,n-V m ,oo||. 

" !=1 

For the second term, we obtain 

l|-E A v z (<^-v m ,„))<Sx, -A; z ((0„„v m ,„)nikr = Op 
n i=\ 

Hence we get 

11+ Op 

The second step is to consider ||/i m ,oo — jU*||yr an d to follow the same guidelines. 
So, we get 

||/V--MllvrHI(A^((«l»,i>« 1 -))-A^((* J v*)))ft||vT 
< 1 1 (A; z ((0 m , v,,,,.) ) - A' Vz ((0 m y)))Px\ | yr 

+ !|(A; z «0,„,v*))-A; z «0,v*)))P x ||vr 

Fo any bounded measurable function g, we can write still using Cauchy-Schwarz 
inequality that 

J x g (x) (a; z ((0 m ( x ) , v,,,,^ ) - a; z {{0 m (x) , v*») ^ to 

< J g(x)A^){<P m (x), v m ,„ - v*)rfft(jc) 

< IIA;;n„ v /E(^))2 v /E(i|0 ffl (x)i|2)||v m , O o-v*!i 
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Hence there exists K 2 > such that 

|| (A{, z ({<P m ,v m ,~)) -A' Vz {{<P m y)))P x \\ VT < K- 2 ||vm,c»-v*||. 

Finally, the last term || (A' Vz ({& m , v*)) - A[, z ({<P,v*))) Px\\vt can be bounded. In- 
deed, for any measurable bounded g 

^g(x)(K z ({& m (x),v*))-Al z ({&(x),v*)))dP x (x) 
= j^g{x)A'^){0 m {x) - <P(x),v*)dPx(x) 

< J^g(x)A: z (^)\\<P m (x) <P(x)\\\\v*\\dP x (x) 

< \\v^\\A^E(g(X))^E(\\0 m (X)-^(X)f) 
Hence there exists ^3 > such that 

\\(Ai z ((0 m y))-A^ z ((<py)))p x \\vT^K 3 \\0 m -<P\\ I 2 

We finally obtain the following bound 

1 1 Am,« -j"*||vr s^^il I v„,.„ - V m , 00 \\+K 2 \ I V m .oo - v* \ \ + K 3 1 1 0„, -<P\\ h 2 + P 
Using Lemmas [Hand |2] we obtain that 

\\v m ,n - V„,,co\\ = P 

p m ,~-v*\\=0 P (cp- 1 ) 

Finally, we get 

\\fl,n,n~H*\\vT=Op(<p,- 1 ) + Op 

which proves the result. □ 

3.2 Application to remote sensing 

In remote sensing of aerosol vertical profiles, one wishes to recover the concentra- 
tion of aerosol particules from noisy observations of the radiance field (i.e., a ra- 
diometric quantity), in several spectral bands (see e.g. Gabella et al, 1997; Gabella, 
Kisselev and Perona, 1999). More specifically, at a given level of modeling, the 
noisy observation y obs may be expressed as 
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y°*» = / <P(x;t°» S )dfl X (x)+e, (6) 

JSC 

where <P : S£ x S? — > R* is a given operator, and where f ofes is a vector of angu- 
lar parameters observed simultaneously with y obs . The aerosol vertical profile is a 
function of the altitude x and is associated with the measure fix to be recovered, 
i.e., the aerosol vertical profile is the Radon-Nykodim derivative of fix with respect 
to a given reference measure (e.g., the Lebesgue measure on K). The analytical ex- 
pression of <P is fairly complex as it sums up several models at the microphysical 
scale, so that basically <P is available in the form of a computer code. So this prob- 
lem motivates the introduction of an efficient numerical procedure for recovering 
the unknwon fix immy obs and arbitrary t obs . 

More generally, the remote sensing of the aerosol vertical profile is in the form of 
an inverse problem where some of the inputs (namely t obs ) are observed simultane- 
ously with the noisy output y" bs . Suppose that random points X\ , . . . ,X n of 3£ have 
been generated. Then, applying the maximum entropy approach would require the 
evaluations of <f>(Xj,t obs ) each time t obs is observed. If one wishes to process a large 
number of observations, say (y° bs ,t° bs ), for different values t° bs , the computational 
cost may become prohibitive. So we propose to replace <P by an approximation <P m , 
the evaluation of which is faster in execution. To this aim, suppose first that 3T is a 
subset of W. Let T\,...,T m be random points of & , independent ofXi,...,X„, and 
drawn from some probability measure fij on admitting a density fj with respect 
to the Lebesgue measure on W such that frif) > for all t e ST. Next, consider the 
operator 

1 1 m 

m (x,t) = —-^K^it-TiWxA), 

where Ki, m (.) is a symetric kernel on 3T of smoothing sequence h„. It is a classical 
result to prove that <P m converges to <P in quadratic norm provided h m tends to at a 
suitable rate, which ensures that Assumption 3 of Theorem[2]is satisfied. Since the 
Ti's are independent from the Xj, one may see that Theorem |2] applies, and so the 
solution to the approximate inverse problem 



y obs = [ ® m (x;t obs )diix{x) + e, 

JSC 



will converge to the solution to the original inverse problem in Eq. [6] In terms of 
computational complexity, the advantage of this approach is that the construction of 
the AMEM estimate requires, for each new observation (y ,t ), the evaluation of 
the m kernels at t" bs , i.e., K hm (t obs - 7}), the m x n ouputs <P(Xj,Tj) for i= l,...,n 
and j = l,...,m having evaluated once and for all. 
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3.3 Application to deconvolution type problems in optical 
nanoscopy 

Following the framework defined in ifTTl . the number of photons counted can be 
expressed using a convolution of p(x — y,y) the probability of recording a photon 
emission at point y when illuminating point x, with dji (v) = f(y)dy the measure of 
the fluorescent markers. 

8{x) = J P(x-y,x)f(y)dy. 

Here p(x — y,y) = p(x,y,(j)(x)). Reconstruction of /I can be achieved using AMEM 
technics. 



4 Tecnical Lemmas 



Recall the following definitions 



v m .oc = argmin H(<P m ,v) = argmin <^ / A Vz ((<P m (x), v))dP x - inf (v,y) 

vgR* vGR* y€K * 

v m ,n = argminfl„(^> m ,v) ^argmini - Y A Vz ((v,<P,„(Xi))) - inf (v,y) 

vgR* vgR* [ n i=\ yeKy J 

v* = argmin// ($>,v) = argmin^ / A Vz ((<P(x),v))dP x (x) - inf (v,y) 

vgR* vGR* yeKy 



Lemma 1 (Uniform convergence at a given approximation level m). For all m, 

we get 

\\$m,n-V m ,oo\\ = Op 

Proof. v m „ is defined as the minimizer of an empirical constrast function H n (<P m , .). 
Indeed, set 

h m {v,x) =A Vz ((<P m (x),v)) - inf (v,y), 

VGAy 

hence 

H(<P m ,x>)=P x h m {v,.). 

Using classical theorem from the theory of M-estimation, we get the convergence in 
probability of v,„ „ towards v m |00 provided that the contrast converges uniformly over 
every compact set of R A towards H(<P m , .) when n — > °°. More precisely Corollary 
5.53 in van der Vaart (1998) states that if we consider x i— > h m (v,x) a measurable 
function and h m a function in L 2 (P), such that for all vi and vj_ in a neighbourhood 
ofv* 
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\h m ( vi , x) - h m (v2 , x) | < h m (x) 1 1 v i - V2 1 1 . 

Moreover if v i— > Ph m (v,.) has a Taylor expansion of order at least 2 around its 
unique minimum v* and if the Hessian matrix at this point is positive, hence pro- 
vided W n h m (v„,.) < F n h m (v*,)+ Op{n- y ) then 

VTi(v„-v*) = P (l). 

We want to apply this result to our problem. Let 7] be an un upper bound for ||e||, 
we set h m (v,x) = A Vz ((<P m (x),v)) — (v,y ) — inf {v,y — y bs)- Now note that 

||>'-y ofcl ||^i) 

y 

z i— > (v,z) reaches its minimum on M(0,r\) at the point — J? 7r~n"> so 

hi 

fc m (v,x) =A Vz ((4> m (x),v)) - (v,y ofa ) + n||v|| 
For all vi, v% G R A , we have 



|/!m(vi,x)-/l m (v 2 ,Jc)| 

|A Vz ((^ m (x),vi))- ii 

^((^W.vi)) -A Vz {(<P m (x),v 2 ))\ + | inf (v 2 ,y) - inf (vi,y>| 



= |Av z ((^mW,Vi))- inf (v 1 ,y)-A Vz ((0 m (x),v 2 ))+ inf (y 2 ,y)| 



yeKy yeKy 
. ,obs \ 



|A Vz ((^ m (x),V 1 ))-A Vz ((^W,V2})| + |(v2-Vl,/ te }-T J (||v2||-||v 1 | 

< (\\Al z U\<P m (x)\\ + ||y*»||+7,)|| Vl - v 2 || 



(^m)m is in L z {Px) uniformly with respect to m, which entails that 



Define h m :x ^ HAy^H^x)!! + \\y obs \\ + rj. Since (<P m ) m is bounded in h 2 (P x ), 



3K,Vm, ( KdPxKK (7) 
J sc 

Hence the function h m satisifes the first condition 

\h m (vi,x)-h m (v 2 ,x)\ <Am(*)||Vi-V 2 || 

Now, consider H(<P m , .) Let V m>v be the Hessian matrix of H(<P m , .) at point v. We 
need to prove that V m $ m M is non negative. Let <9 ( - be the derivative with respect to the 
; th component. Set v 7^ 0, we have 



V m . v ,J (v)=djdjH(<P m ,v) = l^didjh m (v,x)dPx 

® m (x) <t>i (x)A'; z ((<P m (x) ,v))dP x + r, didjNiv) 



where let JV be JV : v h-> ||v||. 

Hence the Hessian matrix V m $ m x of H(<P m , .) at point v m _„ can be split into the sum 



Regularization with Approximated L 2 Maximum Entropy Method 1 3 

ot the following matrices 

(M x )y = j^<Pl(x)<Pl(x)A^((<P m (x),v m ^))dP x , 
(M 2 )ij = didjN{y m ^). 

Under Assumptions (A3) and (A5), A" z is positive and belongs to L\(Px) since it 
is bounded. So we can define J^ - <P] n (x)<P,'„(x)A" z ((<P m (x), v„ h0 o))dPx as the scalar 
product of <P' m and <J>„ in the space L 2 (A^((<5 m (.), v m ,oo)Px)- 
Mi is a Gram matrix, hence using (A6) it is a non negative matrix. 
M2 can be computed as follows. For all v € B^/jO}, we have 



didjN(v) 



Hence for all a £ Mr, we can write 




a T M 2 a 






f llvVl 


12 "2 
— V 

771, 


h ¥ 


1 II 3 
777,oo || 










\\VmA 


" llwll 3 1 





* 2 -I 



1 1 -a 



k k k N 

2 L a ?-E a ^m,=o,i- £ fl AV~./fl;^,,ooj + £fl-V^,oo,i 

7=1 7=1 1<<J<* 1=1 / 



— 112 (|| Vm,oo|| 2 ||a|| 2 — (a, v,„.oo) 2 ) 5^ using Cauchy-Schwarz's inequality. 



So M2 is clearly non negative, hence V,,,,,-,,, „ = Mi + 77M2 is also non negative. Fi- 
nally we conclude that H(<P m , .) undergoes the assumptions of Theorem 5.1. □ 

Lemma 2. 

Pm,~-V*\\=0 P {<p- 1 ) 

Proof. First write, 



14 J-M. Loubes and P. Rochet 

\H(<P m ,v)-H(&,v)\ = \ A Vz ((cP m (x),v)) -A Vz ((cP(x),v))dP x (x)\ 

< KJ-l|v||||«k-*lb, 

which implies uniform convergence over every compact set of H(<P m ,.) towards 
H(<P, .) when m — > °o, yelding that v m „ — > v* in probability. To compute the rate 
of convergence, we use Lemma [3] As previously we can show that the Hessian 
matrix of H(<j>,.) at point v* is positive. We need to prove uniform convergence of 
VH((j> m , .) towards VH(<j),.). For this, write 

d,{H($ m ,.)-H ($,.)} (v) 
=J x < {*)K Z (x) , v) ) - & (x)A' Vz «*(*) , v) (*) 

- <!>'■) (*K z «<£ m (x),v» - *'(x)A£($)<(*- O m )W,v)^W 

^ll^-^II^IKJU+ll^ibiKiuii^-^ibiHi 

using again Cauchy-Schwarz's inequality. Finally we obtain 

llv^^o-jy^OJMIKCCi+ftiiviDH*-^!^ 

for positive constants C\ and C2. For any compact neighbourhood of v*, ,5^, the 
function v 1— ► ||V {H($ m , .) — H(<P, .)) (v)|| converges uniformly to 0. But for to 
large enough, v„, ]00 G 5^ almost surely. Using 2. in Lemma [3] with the function 
v \— > || V (H((j) m , .) —H(<P, .)) (v) || l.yf(v) converging uniformly to 0, implies that 

||0 mi --v*||=O,(^ 1 ).O 

Lemma 3. Lef / fee defined on ,5^ C M rf — > K, which reaches a unique minimum at 
point 9q. Let (f n ) n be a sequence of continuous functions which converges uniformly 
towards f. Let Q n = argmin/,,. If f is twice differentiable on a neighbourhood of 9q 
and provided its Hessian matrix Vq is non negative, hence we get 

1. there exists a positive constant C such that 

||e»-0oKcVII/-/»||~ 

2. Moreover if i— > Vg is continuous in a neighbourhood of 0q and ||V/„(.)|| uni- 
formly converges towards | V/(.)||, hence there exists a constant C' such that 

||ft,-flo||<C||VCf-/„)||- 

with Halloo = sup \\g{x)\\ 

Proof. The proof of this classical result in optimization relies on easy convex anal- 
ysis tricks. For sake of completeness, we recall here the main guidelines. 
1 . There are non negative constants C\ et 8q such that 



Regularization with Approximated I? Maximum Entropy Method 1 5 

VO<c5^<5o, inf f(e)-f(9 )>C 1 8 2 

d(6,6 )>8 

Set \\f„ — /||oo = £„. For < <5i < So, let n be chosen such that 2e„ ^ CiSf. Hence 

inf /„(0)> inf /(0)-£ n >/(0o) + e„^/„(0o) 
rf(e,e )>5i rf(e,flb)>5i 

Finally /„(0 O ) < inf /„(0) =S> 0„ G {0 : rf(0, O ) < Si}, which enables to 

rf(e,e )>5i 

conclude setting C = y^. 

2. We prove the result for d = 1, which can be easily extended for all d Using 
Taylor-Lagrange expansion, there exists 0„ G ] n , 0o [ suc h that 

/'(0 O ) = = /'(0„) + (0o - 0„)/"(0„). 

Remind that f ! {9 u ) — > f f, (Oo) > 0. So, for n large enough there exits C > such 
that 

i fl |_ \ f'( d n)-f'( d 0)\ <r l\\J fl\\ 

which ends the proof. □ 
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